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ABSTRACT 

Phylemon 2.0 is a new release of the suite of 
web tools for molecular evolution, phylogenetics, 
phylogenomics and hypotheses testing. It has 
been designed as a response to the increasing 
demand of molecular sequence analyses for 
experts and non-expert users. Phylemon 2.0 has 
several unique features that differentiates it 
from other similar web resources: (i) it offers an 
integrated environment that enables evolutionary 
analyses, format conversion, file storage and 
edition of results; (ii) it suggests further analyses, 
thereby guiding the users through the web server; 
and (iii) it allows users to design and save phylo- 
genetic pipelines to be used over multiple genes 
(phylogenomics). Altogether, Phylemon 2.0 inte- 
grates a suite of 30 tools covering sequence 
alignment reconstruction and trimming; tree recon- 
struction, visualization and manipulation; and 
evolutionary hypotheses testing. 

INTRODUCTION 

Phylogenetic analysis and model-based hypothesis testing 
are essential elements in current molecular evolution 
studies (1,2). Web servers for phylogenetic and evolution- 
ary analyses range from those running single programs 
to those integrating multiple tools. Among the first 
are servers that execute multiple sequence alignment 
(MSA) tools such as ClustalW (3) (http://www.ebi.ac.uk/ 



clustalw/), or sophisticated programs to test molecular 
adaptation such as the HyPhy environment (4) (http:// 
www.datamonkey.org/). In the second category, resources 
such as the 'Pasteur server' (e.g. see http://bioweb.pasteur 
.fr/seqanal/phylogeny/intro-uk.html), Phylogeny.fr 
(http://www.phylogeny.fr/) (5), CIPRES (http://www 
.phylo.org/) and Phylemon (6) developed the concept of 
integrated platforms, which implement different phylogen- 
etic analysis programs in a single server. 

Phylemon was originally developed in 2007 as a web 
server providing a common framework to run the most 
frequent analyses on DNA and protein sequences from a 
phylogenetic and evolutionary perspective. Phylemon 2.0 
covers a wide, yet selected, range of programs, integrating 
over 30 different tools for phylogenetic and evolutionary 
analyses. Phylemon 2.0 has several unique features that 
differentiates it from other resources: (i) it offers an 
integrated environment that enables the concatenation 
of evolutionary analyses, the storage of results and that 
handles format conversions transparently; (ii) once an 
output file is produced, Phylemon suggests other possible 
analyses that could logically follow, thus guiding the user 
through the web server; and finally (iii) users can build and 
save complete pipelines to be automatically used on many 
genes in subsequent sessions (phylogenomics). 

The main objective of Phylemon is to provide to expert 
and non-expert users all necessary applications in a single 
integrated web framework that guides them through the 
whole sequence evolutionary analysis. Here, we outline the 
main characteristics of the server and the new develop- 
ments added to this version. Phylemon 2.0 is accessible 
at http://phylemon.bioinfo.cipf.es 
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OUTLINE OF THE PROGRAM 

Phylemon 2.0 resources are organized in five major 
sections: Alignment, Phylogeny, Evolutionary Tests, 
Pipeliner and Utilities. Phylemon verifies the format of 
the uploaded input file (non-aligned sequences, aligned 
sequences, distance matrix, tree or pipeline format) and 
stores files for the exclusive use of tools reading that 
format. Users can rename files to help their recognition 
throughout the project. 

A basic phylogenetic analysis consists of: (i) the 
proposal of a hypothesis about positional homology in a 
multiple alignment of sequences; and (ii) the search for a 
tree topology and branch lengths, the main components of 
a phylogeny. Once the phylogenetic hypothesis is solved, 
users may test for additional and specific hypotheses 
related to their sequences, including molecular clock 
behavior, estimation of synonymous and non- 
synonymous distances, maximum likelihood (ML) based 
parameter estimation, statistical support among 
competing topologies, clades or adaptive events acting 



on the sequences. For these purposes, Phylemon 2.0 
groups different tools under the web tabs: Alignment, 
Phylogeny and Evolutionary Tests (see sections below). 

Table 1 lists all the programs implemented in Phylemon 
2.0 and the main connections among them. 

Alignment 

Phylemon 2.0 integrates four different programs for the 
alignment of molecular sequences: ClustalW v2.0.10 (3), 
Muscle v3.7(7), Lagan v2.0 and M-Lagan v2.0 (8). The 
first two are among the most frequently used programs 
for MSA. In this version of the server, we added Lagan 
(Limited Area Global Alignment of Nucleotides) and 
Multi-Lagan, which run efficient algorithms specifically 
developed to produce pairwise and multiple alignments 
of long genomic sequences, respectively. 

Lagan and M-Lagan use a single input file containing 
two or more sequences in Fasta format, respectively. Both 
programs use the Translated Anchoring option, 
translating coding regions to anchor sequences. This is 



Table 1. Programs available in Phylemon 2.0 web server 



Program 11 



Version 



Function 



Output to program Pplin 



Alignment 



1 T 


ClustalW 


2.0.10 


2 T 


Muscle 


3.7 


3 T 


Lagan 


2.0 


4 T 


M-Lagan 


2.0 


5 U 


TrimAl 


1.3 


6 U 


CDS-ProtAl 


1.0 


7 U 


ConcatenAl 


1.0 


8 U 


ReadAl 


1.3 


Phylogeny reconstruction 




9 T 


Seqboot 


Phylip 3.68 


10 T 


Consense 


Phylip 3.68 


11 T 


Dnadist 


Phylip 3.68 


12 T 


Protdist 


Phylip 3.68 


13 T 


DnaML 


Phylip 3.68 


14 T 


Pro ML 


Phylip 3.68 


15 T 


DnaPars 


Phylip 3.68 


16 T 


ProtPars 


Phylip 3.68 


17 T 


Neighbor 


Phylip 3.68 


18 T 


Fitch 


Phylip 3.68 


19 U 


TreeDist 


Phylip 3.68 


20 U 


ETE 


2.1 beta 


21 T 


PhyML-Best-AIC-Tree 


1.0 


22 T 


PhyML 


3.00 


23 T 


Tree-Puzzle 


5.2 


24 T 


MrBayes 


3.1.2 


Evolutionary tests 




25 T 


ProtTest 


1.4 


26 T 


jModelTest 


0.1.0 


27 T 


RRTree 


1.1.11 


28 T 


SLR 


1.3 


29 T 


YN00 


PAML 4.4c 


30 T 


CodeML 


PAML 4.4c 



Multiple alignments. DNA and protein sequences 

Multiple alignments. DNA and protein sequences 

Pairwise alignment. Long and distant genomic sequences 
Multiple alignments. Long and distant genomic sequences 

Automated trimming of MSAs 

Alignment of DNA coding sequence using protein template 
Concatenation of MSAs 

File format conversion 



Bootstrap, jackknife or permutation resampling methods 

Consensus tree reconstruction 

DNA pairwise distances computation 

Protein pairwise distances computation 

ML tree reconstruction from DNA data 

ML tree reconstruction from protein data 

Maximum parsimony tree reconstruction from DNA data 

Maximum parsimony tree reconstruction from protein data 

Tree reconstruction using UPGMA and NJ methods 

Tree reconstruction using LS and ME methods 

Distance computation among tree topologies 

Tree visualization 

ML tree with the best model fitting data under AIC estimation 
Maximum likelihood analysis (MLA) of DNA & protein data 
MLA of DNA & protein sequences using quartets 
Bayesian phylogenetic analysis of DNA and protein sequences 



5, 8, 9, 11-16, 21-23, 
26, 28 

5, 8, 9, 11-16, 21-23, 
26, 28 
5, 8, 9, 11 

5, 8, 9, 11-16, 21-23, 
26, 28 

8, 9, 11-16, 21-23, 
26, 28 

8, 11-16, 22, 28-30 
8, 9, 11-16, 21-23, 
25, 26 

I- 5, 8, 9, 11-16, 21-23, 
25-30 

II- 16 
20 

17, 18 
17, 18 
10, 20 
10, 20 
10, 20 
10, 20 
10, 20 
10, 20 



20 
20 
20 
20 



ML fitting of protein sequences to evolutionary models 

Model testing and phylogeny averaging - 

Relative rate test 

Site-wise analysis of positive and negative selection 

Pairwise analysis of positive selection (PS) with counting methods - 

MLA of PS using sites, branch and branch-site models 



Y 
Y 
Y 
Y 



Y 
Y 

Y 
Y 
Y 



Programs are assembled in three main blocks: (i) alignment and files format conversion; (ii) phylogenetic reconstruction; and (iii) evolutionary tests. 
New resources in this version are shown in cursive. 
a T-U: tools/utilities. 

b Pplin: programs able to run in the Pipeliner. 
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useful when distantly related sequences are compared (i.e. 
primates and fishes). The Reverse Complement option in 
Lagan is useful to search for positional homology on the 
opposite DNA strand of the second sequence. Both 
programs produce a single output file of aligned sequences 
in Fasta format. Multiple alignments in Phylemon 2.0 can 
be sent to distance, parsimony and statistical tree recon- 
struction (ML and Bayesian) tools. Format conversion or 
alignment edition can be performed using ReadAl and 
TrimAl (see 'Utilities' section). 

Phylogeny 

Phylemon 2.0 incorporates distances, parsimony, ML and 
Bayesian methods for tree reconstruction. Distance and 
parsimony methods for DNA or protein sequence data 
are provided by algorithms of the Phylip package (9) 
v3.68: DnaDist, ProtDist, DnaPars and ProtPars, respect- 
ively. ML analysis can be performed using Phylip 
(DnaML, ProML), Tree-Puzzle v5.2 (10) and PhyML 
v3.0 (11,12) programs. Bayesian phylogenetic analysis 
runs in MrBayes (13) v3.1.2. Users have the option to 
interact with the program, thus monitoring the progress 
of the analysis. Program allows the user to specify sump 
and sumt parameters. Users interested to build the 
MrBayes commands block can fill the form that summar- 
izes the main parameters. A useful list of command line 
parameters is available on the fly. 

PhyML-Best-AIC-tree vl.02b is a new tool in 
Phylemon 2.0. It is a Python script allowing the 
reconstruction of ML trees using the best AIC-DNA or 
protein model (14). 

Evolutionary tests 

For users interested in evolutionary hypotheses testing, 
Phylemon 2.0 collects tools of: Model Selection, 
Molecular Adaptation and Relative Rate Test. 

In this version, we added jModelTest (15) vO.l, and a 
new version of ProtTest (16) vl.4 to improve the search 
for the best model of evolution for DNA and protein 
explaining the data. One of the interesting results of 
jModelTest is the average topology obtained by models 
within 95% confidence interval. This topology can be used 
as the intree file required for all programs testing for 
molecular adaptation in Phylemon 2.0. 

Adaptation tests on protein-coding DNA sequences run 
in Phylemon 2.0 by means of Site-wise Likelihood Ratio 
(SLR) test program vsl.3 (17) and CodeML & YN00 (18) 
from PAML vs 4.4c (19). Finally, deviations from the 
molecular clock hypothesis can be tested using the 
RRTree (20) program vs 1.1.11. RRTree computes 
relative rates tests among user-defined lineages with a 
weighted or unweighted scheme of species based on the 
tree topology provided by the user. The program accepts 
different parameters: the number of synonymous substitu- 
tions and synonymous transitions per synonymous 
site (Ks and As, respectively), the number of 
non-synonymous substitutions and non-synonymous 
transversions per non-synonymous site (Ka and Ba, re- 
spectively) and, finally, the number of synonymous 
transversions per 4-fold degenerate site (B4). Kimura 



two parameters (K2P) (21) and Jukes and Cantor (JC) 
models are available for non-coding DNA sequences. 
For protein sequences, RRTree computes a modification 
of JC model. 

Users interested in ML comparison of topologies 
(paired-sites test) can select evaluation of user-defined 
trees option in Tree-Puzzle program. 



PIPELINE AND PHYLOGENOMICS 

Phylogenomic analyses sometimes involve repeating a 
certain set of analysis over several groups of genes. 
In such cases, it is necessary to apply the same set of 
phylogenetic algorithms to different sequence data using 
a single pipeline of tools. To satisfy this requirement, we 
developed the Pipeliner tool. Users interested in such kind 
of studies can upload a zip file containing sequences to run 
in a pipeline. Previous version of Pipeliner provides basic 
programs derived from the Phylip package and pipelines 
like ClustalW, Seqboot, DnaDist/ProtDist, Neighbor and 
Consense may be used in that order, for a phylogenetic 
reconstruction with bootstrap values. 

Pipeliner in Phylemon 2.0 added PhyML and PhyML- 
Best-AIC-tree to select the best tree after comparing all 
AIC (Akaike Informaion Criteria) estimations of DNA or 
protein models. 

Users can add tools from the list of tools and connect 
them using the 'create link' option. Once all the options of 
the tools are completed, the user can run and save the 
pipeline for future jobs. 



UTILITIES 

Phylemon 2.0 implements three new utilities. First, 
TrimAl vsl.3 (22) for automatically removing poorly 
aligned regions from MSAs. The user can select a set of 
columns to be removed or set specific thresholds based on 
the fraction of gaps or the similarity of residues in a 
column. Additionally, TrimAl implements a series of 
automated algorithms that apply different optimized 
thresholds, based on the characteristics of each alignment. 
Second, CDS-ProtAl, a new tool for multiple coding 
sequence alignment based on protein sequences. This 
program uses Muscle to compute protein alignments 
using default parameters but capped at 5h running time 
or 9999 iterations on a translation (universal genetic code) 
of the coding DNA sequences provided as input. Finally, 
ReadAl v 1.3 a new tool for file conversion among the 
most popular format files used in phylogeny has been 
included. 

ETE vs2.1 (23) allows users to visualize and interact 
with trees. The new version allows rooting, collapsing, 
expanding or swapping nodes and incorporates the possi- 
bility to search for distances, support values or names 
(including the use of Perl-based regular expressions). 
These options and many others are available by clicking 
on the nodes, and in the close framework of the tree by 
using left mouse buttons. 
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REGISTRATION, ACCOUNTS, PROJECTS AND 
SPACE 

Phylemon 2.0 can be accessed by anonymous login or by 
registered users. The only difference between these choices 
is that registered users, from whom only an e-mail is 
required, can have many different projects and use the 
server to store up to 1.0 GB of data for future use. Files 
from anonymous users are deleted after 24 h. Projects and 
jobs in Phylemon 2.0 can be created, renamed and deleted 
by users. The number of jobs finished and waiting to be 
visited, visited, running and those waiting to be run in the 
queue are colored green, blue, red and yellow, respectively. 
Users have two icons to access to files, projects and data 
management. 

Technical details 

Phylemon 2.0 has been completely reengineered. The 
server-side is implemented in Java, the client-side is imple- 
mented in AJAX (Asynchronous JavaScript And XML). 
JSON (JavaScript Object Notation) exchanges client and 
server data. Consequently, the new interface allows asyn- 
chronous use of tools (a program can be left running to 
later come back to see the results), including new facilities 
for the management of projects and jobs. Moreover, 
a queue system has been implemented in the server. This 
release makes an intensive use of new web technologies 
and standards, so the supported browsers for this 
version are as follows: Chrome 7+, Firefox 3.5+, Safari 
4+, Opera 10+ and Internet Explorer 8. Internet Explorer 
6 and 7 are no longer supported. Pipeliner was developed 
in HTML5 JavaScript and makes use of Scalable Vector 
Graphics (SVG), therefore it runs in Chrome7+, 
Firefox4+ or Internet Explorer 9+. More details are avail- 
able at the Wiki-Help: http://docs.bioinfo.cipf.es/ 
phylemonwiki/doku.php. 

DISCUSSION 

Molecular evolution, phylogenetics, phylogenomics and 
evolutionary hypothesis testing embrace a wide range of 
scientific enquiries in biology. Following the last develop- 
ments in the field, Phylemon 2.0 combines tools and 
programs ranging from the simplest distance phylogenetic 
reconstruction, or the basic relative rate test, to the newest 
ML model-averaged estimation of the tree topology or the 
analysis of molecular adaptation. The incorporation of 
new tools in Phylemon 2.0 extends its usefulness to 
advanced users trying to find answers to more complex 
questions of phylogeny and evolution in a web server. 
Phylemon 2.0 addresses an important requirement of 
users and students of evolution and phylogeny; namely, 
the need for a public web server providing a core set of 
format-compatible, classical and advanced tools truly 
integrated in an independent web platform. 
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